Analysing and Visualising Spatio-temporal Patterns of COVID-19 in DKI Jakarta, Indonesia.
This analysis aims to analyse and visualise spatio-temporal patterns of COVID-19 in DKI Jakarta, Indonesia. Out of 34 provinces in Indonesia, DKI Jakarta was the province affected most by the pandemic, with close to 24% of cumulative confirmed cases. However, the cumulative confirmed cases were not evenly distributed, therefore this analysis intends to unravel which sub-districts had the highest number of cases and how time has changed the overall distribution.
For this analysis, the following data are used:
Open Data Covid-19 Provinsi DKI Jakarta. This portal provides daily update of COVID-19 measures at both sub-district and district level. For the purpose of this analysis, data at the sub-district level is used. The datasets are in .CSV format, and monthly datasets from March 2020 to July 2021 will be used.
Indonesia Geospatial. This portal provides a comprehensive collection of geospatial data mainly in ESRI shapefile format at different geographical levels. For the purpose of this analysis, the Shapefile (SHP) Batas Desa Provinsi DKI Jakarta provided at PODES 2019 geospatial layer is used.
P.S. March 2020 Dataset only started from 25 March 2020 as per the source, so the dataset might not be a true representative of March 2020. Similarly for January 2021 Dataset, 30 January 2021 is the most updated data, the link to access 31 January 2021 is broken so this may not be a true representation of January 2021.
R packages will be used for efficiency and a more comprehensive analysis, such as tidyverse and sf etc.
load("THE1_workspace.Rdata")
packages = c("tidyverse", "sf", "readxl", "readr", "stringr", "tmap", "lemon", "formatR")
for (p in packages) {
if (!require(p, character.only = T)) {
install.packages(p)
}
library(p, character.only = T)
}
list.files function helps to create a list from the imported data files. The files are also imported all at once using a pattern to match the file names, ensuring full efficiency as compared to importing the files individually.
The R function lapply is also complementary for this process, as well as adding the file names as an additional column to the dataframes (So as to display the data by month later on).
file_list <- list.files(path = "data/the1data/COVID-DATA", pattern = "*.xlsx", full.names = T)
df_list <- lapply(seq_along(file_list), function(x) transform(read_xlsx(file_list[x]),
MonthYear = file_list[x]))
From here we then manually check through df_list to find which Meninggal column is the correct one for each file, for referencing the coalesce process later on in the next few steps (E.g. For February 2021, Meninggal…1 is the correct column to use since it is the Meninggal column with no NA values.)
The inspection tells us that Meninggal…23 to Meninggal…25 is not used so we can skip those columns later on in the coalesce process.
df_list
We will then combine df_list into a real dataframe using Idlpy function from the plyr package.
To combine/integrate values from the various Meninggal columns, we will have to convert Meninggal…26 column’s data type so we can use coalesce function later on (because it originally is a chr type and chr type cannot combine with double type).
July 2020 is using Meninggal…26 as the correct column, so we have to carry out the conversion.
df$Meninggal...26 = as.double(df$Meninggal...26)
df <- df %>%
na_if("N/A")
In this step we aim to combine ID_KEL into one main column, since some of the files have different layouts. Coalesce is a function to take in values from another column (So if ID_KEL has NA values while ID_KEL…1 has values, we will take from ID_KEL…1 and add them into ID_KEL)
df <- df %>%
mutate(ID_KEL = coalesce(ID_KEL, ID_KEL...1, ID_KEL...2))
Similarly, we want to combine menninggal together as one column, since some of the files have different layout. We know that only 28,29,30,31 is used after checking the dataset earlier on during “df_list”, so we will only integrate those columns together.
df <- df %>%
mutate(Meninggal = coalesce(Meninggal, Meninggal...28, Meninggal...29, Meninggal...30,
Meninggal...31, Meninggal...26))
Next, we will get only the required columns out from the dataframe.
So df2 is the modified dataframe, and currently this is the output. lemon_print is used for a more aesthetically pleasing dataframe.
head(df2)
| MonthYear | ID_KEL | Nama_provinsi | nama_kota | nama_kecamatan | nama_kelurahan | POSITIF | Meninggal |
|---|---|---|---|---|---|---|---|
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | NA | NA | NA | NA | TOTAL | 339735 | 5478 |
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | 3172051003 | DKI JAKARTA | JAKARTA UTARA | PADEMANGAN | ANCOL | 834 | 9 |
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | 3173041007 | DKI JAKARTA | JAKARTA BARAT | TAMBORA | ANGKE | 617 | 8 |
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | 3175041005 | DKI JAKARTA | JAKARTA TIMUR | KRAMAT JATI | BALE KAMBANG | 755 | 15 |
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | 3175031003 | DKI JAKARTA | JAKARTA TIMUR | JATINEGARA | BALI MESTER | 358 | 8 |
| data/the1data/COVID-DATA/Standar Kelurahan Data Corona (28 Februari 2021 Pukul 10.00).xlsx | 3175101006 | DKI JAKARTA | JAKARTA TIMUR | CIPAYUNG | BAMBU APUS | 870 | 13 |
As shown above, we need to clean up the MonthYear column as it is very messy. We will clean up the MonthYear column using str_replace function, to replace unnecessary characters in the values.
df2 <- df2 %>%
mutate_at("MonthYear", str_replace, "data/the1data/COVID-DATA/Standar Kelurahan Data Corona",
"")
df2 <- df2 %>%
mutate_at("MonthYear", str_replace, "[(]", "")
df2 <- df2 %>%
mutate_at("MonthYear", str_replace, "[)]", "")
df2 <- df2 %>%
mutate_at("MonthYear", str_replace, ".xlsx", "")
We will then turn MonthYear column into date data type using Sys.setlocale and as.Date functions.
Sys.setlocale(locale = "ind")
[1] "LC_COLLATE=Indonesian_Indonesia.1252;LC_CTYPE=Indonesian_Indonesia.1252;LC_MONETARY=Indonesian_Indonesia.1252;LC_NUMERIC=C;LC_TIME=Indonesian_Indonesia.1252"
Next, deleting away wrong/unnecessary rows (NA values in ID_KEL, subdistrict name as rows in ID_KEL column etc).
Sorting the modified dataframe by MonthYear and resetting of index.
With all the data cleaning done, we will end out with writing out a .rds file from final_df and utilising it further on in the analysis, so as to ensure minimum data storage space.
aspatial_df <- write_rds(final_df, "data/the1data/rds/aspatial_df.rds")
aspatial_df <- read_rds("data/the1data/rds/aspatial_df.rds")
The following code cunk imports the DKI Jakarta geospatial data into R as a simple feature dataframe.
geospatial_df <- st_read(dsn = "data/the1data/BATAS DESA DESEMBER 2019 DUKCAPIL DKI JAKARTA",
layer = "BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA")
Reading layer `BATAS_DESA_DESEMBER_2019_DUKCAPIL_DKI_JAKARTA' from data source `C:\teojp3\IS415_blog\data\the1data\BATAS DESA DESEMBER 2019 DUKCAPIL DKI JAKARTA'
using driver `ESRI Shapefile'
Simple feature collection with 269 features and 161 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: 106.3831 ymin: -6.370815 xmax: 106.9728 ymax: -5.184322
Geodetic CRS: WGS 84
st_crs(geospatial_df)
Coordinate Reference System:
User input: WGS 84
wkt:
GEOGCRS["WGS 84",
DATUM["World Geodetic System 1984",
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["latitude",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["longitude",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
ID["EPSG",4326]]
From the above output, we see that the Projected Coordinates System is WGS84 which is wrong, therefore we’ll Change the Projected Coordinates Systems to DGN95 (which is the national Projected Coordinates Systems of Indonesia).
jakarta_DGN95 <- st_transform(geospatial_df, 23845)
Outer Islands are also known as Kepulauan Seribu, we will exclude them as they are detached from the mainland.
jakarta_DGN95 <- subset(jakarta_DGN95, KAB_KOTA != "KEPULAUAN SERIBU")
For the analysis, we will only keep the first nine fields, whereby the last field is JUMLAH_PEN (Total Population).
jakarta_DGN95 <- jakarta_DGN95[, 0:9]
Some inaccuracies in the data were discovered between identification keys from aspatial and geospatial after manually checking through their data. Therefore, necessary amendments were made to the data.
If data cleaning process is not done, there will be data missing from some states (Plotted maps will have missing values).
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "BALEKAMBANG"] <- "BALE KAMBANG"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "HALIM PERDANA KUSUMA"] <- "HALIM PERDANA KUSUMAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "JATIPULO"] <- "JATI PULO"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KALIBARU"] <- "KALI BARU"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KRAMATJATI"] <- "KRAMAT JATI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PALMERIAM"] <- "PAL MERIAM"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PINANGRANTI"] <- "PINANG RANTI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PAL MERAH"] <- "PALMERAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "TENGAH"] <- "KAMPUNG TENGAH"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "PULOGADUNG"] <- "PULO GADUNG"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KALI DERES"] <- "KALIDERES"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "RAWAJATI"] <- "RAWA JATI"
jakarta_DGN95$DESA_KELUR[jakarta_DGN95$DESA == "KRENDANG"] <- "KERENDANG"
Function to help us check if the identifier keys between the aspatial and geospatial data are correct.
We will drop NA value rows, as they will only affect our analysis results.
jakarta_DGN95 <- jakarta_DGN95 %>%
drop_na(OBJECT_ID)
Functions from various packages will be used: e.g. pivot_wider() of tidyr package and group_by of dplyr package.
The formula we are using is:
COVID-19 Cases Rate per 10,000 population (Monthly) = (Total Cumulative Cases / Total Population) x 10000
Case_Rate <- aspatial_df %>%
inner_join(jakarta_DGN95, by = c(nama_kelurahan = "DESA_KELUR")) %>%
group_by(nama_kelurahan, MonthYear) %>%
dplyr::summarise(Covid_Cases_Per_10000_Pop = ((sum(POSITIF)/JUMLAH_PEN) * 10000)) %>%
ungroup() %>%
pivot_wider(names_from = MonthYear, values_from = Covid_Cases_Per_10000_Pop)
Functions from various packages will be used: e.g. pivot_wider() of tidyr package and group_by of dplyr package.
The formula we are using is:
COVID-19 Death Rate per 10,000 population (Monthly) = (Total Cumulative Deaths / Total Population) x 10000
Death_Rate <- aspatial_df %>%
inner_join(jakarta_DGN95, by = c(nama_kelurahan = "DESA_KELUR")) %>%
group_by(nama_kelurahan, MonthYear) %>%
dplyr::summarise(Death_Rate_Per_10000 = ((sum(Meninggal)/JUMLAH_PEN) * 10000)) %>%
ungroup() %>%
pivot_wider(names_from = MonthYear, values_from = Death_Rate_Per_10000)
Preparing another dataframe Positive_Cases to join Death_Rate dataframe in the upcoming processes, to add the additional columns needed for relative risk mapping later on (E.g. POSITIF and Meninggal).
Positive_Cases <- aspatial_df %>%
select("MonthYear", "nama_kelurahan", "POSITIF", "Meninggal")
Positive_Cases <- Positive_Cases[Positive_Cases$MonthYear == "2021-07-31", ]
Positive_Cases dataframe will now be integrated within Death_Rate dataframe.
The columns will be renamed and simplified for easier identification, such as removing Date value from the MonthYear values (e.g. “2021-02-28” into “02-2021”).
colnames(Case_Rate) <- c("SUB_DISTRICT", "03-2020", "04-2020", "05-2020", "06-2020",
"07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
"02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021")
colnames(Death_Rate) <- c("SUB_DISTRICT", "POSITIVE_CASES", "DEATHS", "03-2020",
"04-2020", "05-2020", "06-2020", "07-2020", "08-2020", "09-2020", "10-2020",
"11-2020", "12-2020", "01-2021", "02-2021", "03-2021", "04-2021", "05-2021",
"06-2021", "07-2021")
A code to quickly check and ensure there are no NA values.
Death_Rate dataframe is found to have NA rows, therefore we deleted them using na.omit() function.
Rds files are used as a form of best practice, so as to lessen the amount of data storage for the required data.
Case_Rate_rds <- write_rds(Case_Rate, "data/the1data/rds/Case_Rate_rds.rds")
Death_Rate_rds <- write_rds(Death_Rate, "data/the1data/rds/Death_Rate_rds.rds")
Case_Rate_rds <- read_rds("data/the1data/rds/Case_Rate_rds.rds")
Death_Rate_rds <- read_rds("data/the1data/rds/Death_Rate_rds.rds")
left_join of dplyr is used to join the geospatial data and attributes from the aspatial data using the names of sub-districts as a common identifier (“DESA_KELUR for jakarta_DGN95, and”SUB_DISTRICT" for Case/Death_Rate_rds)
Once again, checking for NA rows to ensure substantial analysis results will be attained.
Simple feature collection with 0 features and 26 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: DGN95 / Indonesia TM-3 zone 54.1
[1] OBJECT_ID KODE_DESA DESA KODE PROVINSI KAB_KOTA
[7] KECAMATAN DESA_KELUR JUMLAH_PEN 03-2020 04-2020 05-2020
[13] 06-2020 07-2020 08-2020 09-2020 10-2020 11-2020
[19] 12-2020 01-2021 02-2021 03-2021 04-2021 05-2021
[25] 06-2021 07-2021 geometry
<0 rows> (or 0-length row.names)
Simple feature collection with 0 features and 28 fields
Bounding box: xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: DGN95 / Indonesia TM-3 zone 54.1
[1] OBJECT_ID KODE_DESA DESA KODE
[5] PROVINSI KAB_KOTA KECAMATAN DESA_KELUR
[9] JUMLAH_PEN POSITIVE_CASES DEATHS 03-2020
[13] 04-2020 05-2020 06-2020 07-2020
[17] 08-2020 09-2020 10-2020 11-2020
[21] 12-2020 01-2021 02-2021 03-2021
[25] 04-2021 05-2021 06-2021 07-2021
[29] geometry
<0 rows> (or 0-length row.names)
No NA rows were found, so we are good to proceed on.
Plotting small multiples choropleth maps using KAB_KOTA field (Municipality Level, e.g. West Jakarta, North Jakarta etc). March 2020 is the first month that we have data on so we will be using March 2020 data to start this study.
tm_shape(Case_Rate_Final) + tm_fill("03-2020", style = "jenks", palette = "Blues",
thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative confirmed Case\nRate per 10,000 Population (March 2020)",
main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

From the plotted map, we can see that there is an even distribution throughout the different municipalities (no particular municipality with outlandish number of cases).
But Gelora in Tanah Abang, Jakarta Pusat, and the area surrounding Senayan in Kebayoran Baru, Jakarta Selatan seemingly has slightly more cases than the other sub-districts as shown by their darker blue shading.
tm_shape(Death_Rate_Final) + tm_fill("03-2020", style = "jenks", palette = "Greens",
thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative Death Rate\nper 10,000 Population (March 2020)",
main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

Similar to case rate, we can see that there is a seemingly even distribution throughout the different districts (no particular district with outlandish number of death cases).
However, the sub-districts of Gondangdia in Mentang, Jakarta Pusat and Karet Semanggi of Setiabudi, Jakarta Selatan has slightly more death cases than the other sub-districts.
July 2021 is the last month that we have data on, so we will be using it to see the change in distribution from the first to the last month.
tmap_mode("plot")
tm_shape(Case_Rate_Final) + tm_fill("07-2021", style = "jenks", palette = "Blues",
thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative confirmed Case\nRate per 10,000 Population (July 2021)",
main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

We won’t be without taking into account the difference in number of cases from March 2020 to July 2020, this visualisation simply serves as a mean for comparing the distribution of cases across Jakarta.
Similar to the first month, an even distribution is seen throughout the different districts.
However, there is a concentrated subdistrict Gambir in Gambir, Jakarta Pusat with a much higher number of cases than the other sub-districts as shown by the darker blue shading.
tm_shape(Death_Rate_Final) + tm_fill("07-2021", style = "jenks", palette = "Greens",
thres.poly = 0) + tm_facets(by = "KAB_KOTA", free.coords = TRUE, drop.shapes = TRUE) +
tm_layout(legend.show = TRUE, main.title = "Distribution of cumulative Death Rate\nper 10,000 Population (July 2021)",
main.title.position = "left", title.size = 12) + tm_borders(alpha = 0.5)

Similar to case rate, we can see that there is a seemingly even distribution throughout the different sub-districts of Jakarta.
However, the subdistrict of Gambir in Gambir, Jakarta Pusat is an outlier as well, with a much higher number of death cases than the other sub-districts as shown by the darker green shading.
P.S. As mentioned earlier on, these visualisations does not take into account the difference in number of cases from March 2020 to July 2020, we are only comparing the change in distribution of cases across Jakarta.
To take into account the increase in total cases, time-series choropleth maps will be used to do this analysis.
Firstly, summary function is used to find the minimum and highest value from Case_Rate_Final “07-2021” (AKA column with the most updated number of cumulative positive cases), so as to determine the appropriate category breakpoints.
summary(Case_Rate_Final$`07-2021`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
187.3 545.1 658.3 705.3 779.7 3808.3
With reference to the results above, the breakpoints for our breaks vector will be c(0, 500, 1000, 1500, 2000, 2500, 3000).
We will plot small multiple maps to see how the distribution of COVID cases has changed on a monthly basis from March 2020 to July 2021.
tm_shape(Case_Rate_Final) + tm_polygons(c("03-2020", "04-2020", "05-2020", "06-2020",
"07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
"02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021"), breaks = c(0,
500, 1000, 1500, 2000, 2500, 3000), palette = "Reds") + tm_layout(panel.show = TRUE,
panel.labels = c("03-2020", "04-2020", "05-2020", "06-2020", "07-2020", "08-2020",
"09-2020", "10-2020", "11-2020", "12-2020", "01-2021", "02-2021", "03-2021",
"04-2021", "05-2021", "06-2021", "07-2021"), panel.label.size = 3, asp = NA,
legend.show = FALSE)


Similar to the above visualisation, but now we will look at an estimated 5-month change in time period (Splitting the 17 months into 4 parts, which equals ~5) to see how the distribution of COVID cases has changed from March 2020 to July 2021.
tm_shape(Case_Rate_Final) + tm_polygons(c("03-2020", "08-2020", "02-2021", "07-2021"),
breaks = c(0, 500, 1000, 1500, 2000, 2500, 3000), palette = "Reds") + tm_layout(panel.show = TRUE,
panel.labels = c("03-2020", "08-2020", "02-2021", "07-2021"), panel.label.size = 3,
asp = NA, legend.show = FALSE)


To find the minimum and highest value from Death_Rate_Final “07-2021” (AKA column with the most updated number of cumulative death cases).
summary(Death_Rate_Final$`07-2021`)
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.439 8.446 10.298 10.602 12.508 42.098
Plotting small multiple maps to see how the distribution of deaths has changed on a monthly basis from March 2020 to July 2021.
tm_shape(Death_Rate_Final) + tm_polygons(c("03-2020", "04-2020", "05-2020", "06-2020",
"07-2020", "08-2020", "09-2020", "10-2020", "11-2020", "12-2020", "01-2021",
"02-2021", "03-2021", "04-2021", "05-2021", "06-2021", "07-2021"), breaks = c(0,
5, 10, 15, 20, 25, 30, 35, 40), palette = "Purples") + tm_layout(panel.show = TRUE,
panel.labels = c("03-2020", "04-2020", "05-2020", "06-2020", "07-2020", "08-2020",
"09-2020", "10-2020", "11-2020", "12-2020", "01-2021", "02-2021", "03-2021",
"04-2021", "05-2021", "06-2021", "07-2021"), panel.label.size = 3, asp = NA,
legend.show = FALSE)

Similar to the above visualisation, but with an estimated 5-month change in time period (Splitting the 17 months into 4 parts, which equals ~5) to see how the distribution of deaths from COVID has changed from March 2020 to July 2021.
tm_shape(Death_Rate_Final) + tm_polygons(c("03-2020", "08-2020", "02-2021", "07-2021"),
breaks = c(0, 5, 10, 15, 20, 25, 30, 35, 40), palette = "Purples") + tm_layout(panel.show = TRUE,
panel.labels = c("03-2020", "08-2020", "02-2021", "07-2021"), panel.label.size = 3,
legend.show = FALSE)

tmap_mode("view")
tm_shape(Death_Rate_Final) + tm_fill("03-2020", breaks = c(0, 4, 13, 53, 119, 270,
395, 3808), palette = "Reds", title = "Cumulative Case Rate") + tm_layout(main.title = "Cumulative Confirmed Case Rate in March 2020",
main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_borders(alpha = 0.5) +
tm_compass(type = "8star", size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
"bottom")) + tm_grid(alpha = 0.2)
PERCENTILE MAP????
percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- Case_Rate_Final["07-2021"] %>%
st_set_geometry(NULL)
quantile(var[, 1], percent)
0% 1% 10% 50% 90% 99% 100%
187.3189 286.2999 458.1759 658.3395 939.6568 1966.0003 3808.2902
percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- get.var("07-2021", Case_Rate_Final)
bperc <- quantile(var, percent)
tm_shape(jakarta_DGN95) + tm_polygons() + tm_shape(Case_Rate_Final) + tm_fill("07-2021",
title = "07-2021", breaks = bperc, palette = "Blues", labels = c("< 1%", "1% - 10%",
"10% - 50%", "50% - 90%", "90% - 99%", "> 99%")) + tm_borders() + tm_layout(title = "Percentile Map",
title.position = c("right", "bottom"))
percentmap <- function(vnam, df, legtitle = NA, mtitle = "Percentile Map") {
percent <- c(0, 0.01, 0.1, 0.5, 0.9, 0.99, 1)
var <- get.var(vnam, df)
bperc <- quantile(var, percent)
tm_shape(jakarta_DGN95) + tm_polygons() + tm_shape(df) + tm_fill(vnam, title = legtitle,
breaks = bperc, palette = "Blues", labels = c("< 1%", "1% - 10%", "10% - 50%",
"50% - 90%", "90% - 99%", "> 99%")) + tm_borders() + tm_layout(main.title = "Cumulative Confirmed Case Rate in July 2021",
main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_compass(type = "8star",
size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
"bottom"))
}
CR_July2021_percentmap <- percentmap("07-2021", Case_Rate_Final)
CR_March2020_percentmap <- percentmap("03-2020", Case_Rate_Final)
CR_March2020_percentmap
boxbreaks <- function(v, mult = 1.5) {
qv <- unname(quantile(v))
iqr <- qv[4] - qv[2]
upfence <- qv[4] + mult * iqr
lofence <- qv[2] - mult * iqr
# initialize break points vector
bb <- vector(mode = "numeric", length = 7)
# logic for lower and upper fences no lower outliers
if (lofence < qv[1]) {
bb[1] <- lofence
bb[2] <- floor(qv[1])
} else {
bb[2] <- lofence
bb[1] <- qv[1]
}
if (upfence > qv[5]) {
# no upper outliers
bb[7] <- upfence
bb[6] <- ceiling(qv[5])
} else {
bb[6] <- upfence
bb[7] <- qv[5]
}
bb[3:5] <- qv[2:4]
return(bb)
}
July2021_No_Zero <- Case_Rate_Final %>%
filter("07-2021" >= 0)
var <- get.var("07-2021", Case_Rate_Final)
boxbreaks(var)
[1] 187.3189 193.3497 545.1432 658.3395 779.6722 1131.4658
[7] 3808.2902
boxmap <- function(vnam, df, legtitle = NA, mtitle = "Box Map", mult = 1.5) {
var <- get.var(vnam, df)
bb <- boxbreaks(var)
tm_shape(df) + tm_fill(vnam, title = legtitle, breaks = bb, palette = "Blues",
labels = c("lower outlier", "< 25%", "25% - 50%", "50% - 75%", "> 75%", "upper outlier")) +
tm_borders() + tm_layout(main.title = "Cumulative Confirmed Case Rate in July 2021",
main.title.position = "center", main.title.size = 1.2, legend.outside = TRUE,
legend.height = 0.45, legend.width = 0.35, frame = TRUE) + tm_compass(type = "8star",
size = 2, position = c("left", "bottom")) + tm_scale_bar(position = c("left",
"bottom"))
}
boxmap("07-2021", Case_Rate_Final)
# Jakarta_Death_Final <- Jakarta_Death_Final %>%
# group_by('Expected_Death',Sub_District) %>% dplyr::summarise(`SMR` =
# (Death)/(Expected_Death)) %>% ungroup() Jakarta_Death_Final <-
# Jakarta_Death_Final %>% group_by(Sub_District,SMR) %>%
# dplyr::summarise(`SMR_Avg` = sum(SMR)/15)